Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Dockerfile image to wolfi and add pipeline for vulnerability scanning #3063

Merged
merged 9 commits into from
Jan 20, 2025

Conversation

kostasb
Copy link
Contributor

@kostasb kostasb commented Dec 27, 2024

https://github.com/elastic/search-developer-productivity/issues/3547

Description

The goal of this PR is to:

  1. move "extensible Dockerfiles" to a distroless base image. Dockerfile and Dockerfile.ftest are not used in Elastic's docker image releases but are rather provided as a stepping stone for users to build custom images on.
  2. add a pipeline for vulnerability scanning of built images

This is based on @acrewdson 's draft. It uses wolfi-base images and adds buildkite pipelines to build, test and scan the resulting docker images using trivy.

A notification method for vulnerability reports from Trivy is TBD.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Changes Requiring Extra Attention

  • Security-related changes (encryption, TLS, SSRF, etc)
  • New external service dependencies added.

Release Note

The OSS Dockerfile provided in the connectors repo has been updated to use a different base image for improved security. Users building custom docker images based on this Dockerfile may have to review their configuration for compatibility with the new base image.

Copy link
Contributor

@acrewdson acrewdson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great start! I added a few comments

Dockerfile Show resolved Hide resolved
catalog-info.yaml Outdated Show resolved Hide resolved
catalog-info.yaml Outdated Show resolved Hide resolved
.buildkite/dockerfiles-pipeline.yml Outdated Show resolved Hide resolved
schedules:
Daily main:
branch: main
cronline: '@daily'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should consider running this less frequently? I think it would depend on what @elastic/search-extract-and-transform prefer, but the scenario I picture is something like:

  • this job runs once a week, or maybe even once every two weeks
  • if Trivy identifies any vulnerabilities, a message is sent to a Slack channel, or a GitHub issue is created

let's see what folks think 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if Trivy identifies any vulnerabilities, a message is sent to a Slack channel, or a GitHub issue is created

I think these notifications could potentially be a duplicate of existing Snyk issues detected on the official container image. What's the value added by these new notifications?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is Snyk going to scan the images produced by these CI jobs?
I think this PR is doing 3 things, but maybe doesn't need to.

  1. switches our Dockerfile to use chainguard (has customer value)
  2. Adds CI to validate the Dockerfile works (has customer value)
  3. Adds CI to fail if a vuln is found (internal value, with indirect customer value)

I think (3) could probably be done separately, and reuse whatever machinery we use to scan other artifacts, especially since (1) and (2) will significantly cut down on false-positives that we'd have if we tried to do (3) on its own today.

Copy link
Contributor Author

@kostasb kostasb Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI jobs in this PR do not push images to the docker registry, instead it relies on Trivy for vulnerability scanning, which runs within the context of the pipeline.

The alternative approach is to push the images to an internal namespace on the docker registry and request for these to be added to Snyk. This would result in duplicate reports though as we're already scanning the docker.elastic.co/integrations/elastic-connectors with Snyk built from the Dockerfile.wolfi image which is technically the exact same base OS build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value added by these new notifications?

...

which is technically the exact same base OS build.

These are really good points that I hadn't considered. Given that with this PR we are:

  • switching the Dockerfile to become distroless, and making the resultant image considerably more secure
  • making the Dockerfile essentially the same as the Dockerfile.wofli, the latter of which produces the images that are pushed and scanned by Snyk
  • adding nothing in the Dockerfile except the bare minimum that we need to run connectors (assuming we get rid of git and make 👍)

... then I actually don't see much value in adding the Trivy step (let alone adding notifications based on its results). I wish I'd had these realizations sooner, apologies @kostasb.

What do you all think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostasb thanks, I like the idea of leaving the Trivy step there, but not having it send any notifications. But I'll defer to E&T folks (@seanstory @artem-shelkovnikov) to say what they prefer 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection to leaving it (without notifications) but also I don't think a human will be manually looking at the outputs often. So if we're ok as treating it essentially as a source of debug logs that may or may not be examined, great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the following might be a big direction change for this PR, but considering the feedback above, I think it might be worthy to explore: what if, instead of adding a new pipeline (with informative value only), we add the Trivy scan step to the existing pipeline? I think the informative value would be higher for developers working in PRs: the Trivy scan would be part of their feedback loop, and not something relegated to a different pipeline.

Similarly to what we did for ent-search, I would run the Trivy scan step in PRs to main, and on merge on main and other current release branches. What do you all think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging the extensible Dockerfile build/test/scan steps into the main pipeline makes sense to me, we can proceed this way if the team agrees.

Copy link
Contributor Author

@kostasb kostasb Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that merges the extensible Dockerfiles pipeline into the main pipeline (easy to revert if the team @seanstory @artem-shelkovnikov has a different opinion after all).

As a bonus point, this also enables testing as part of this PR (the new dedicated pipeline wouldn't be picked up until after merged).

.buildkite/dockerfiles-pipeline.yml Outdated Show resolved Hide resolved
.buildkite/dockerfiles-pipeline.yml Outdated Show resolved Hide resolved
.buildkite/dockerfiles-pipeline.yml Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@kostasb
Copy link
Contributor Author

kostasb commented Dec 30, 2024

Since in this PR we are building a Dockerfile on a different base image (cgr.dev/chainguard/wolfi-base) there are package discrepancies comparing to the base image used in the official built images wolfi/python:3.11-dev.

For reference, here is a list of packages which, among others (like bash and curl), are available in the wolfi python-3.11 image used for the production image builds.

Do we want this list of packages added to the wolfi-base image? Are connectors requiring some, or all, of them? @elastic/search-extract-and-transform

@artem-shelkovnikov
Copy link
Member

I'm not sure if we need the packages mentioned, we really only need dependencies for 3.11 python + base64 available.

@kostasb kostasb force-pushed the kostasb/switch-dockerfile-image branch from 84d8123 to cdfd17b Compare December 31, 2024 15:03
@kostasb
Copy link
Contributor Author

kostasb commented Dec 31, 2024

I'm not sure if we need the packages mentioned, we really only need dependencies for 3.11 python + base64 available.

Thanks, then I'll stick with the bare minimum packages and we can iterate if needed.

I added both amd64 and arm64 image builds, mainly for the purpose of smoke testing (with .buildkite/publish/test-docker.sh) on both architectures. I don't expect deviations in the vulnerability testing between architectures.

Do we need some sort of release note for the new base image? It could be a "breaking" change for those users who build their images on top of our Dockerfiles.

@kostasb kostasb force-pushed the kostasb/switch-dockerfile-image branch from 25f6298 to a1b50bd Compare December 31, 2024 17:33
@kostasb kostasb marked this pull request as ready for review December 31, 2024 17:34
@kostasb kostasb requested a review from a team as a code owner December 31, 2024 17:34
@kostasb
Copy link
Contributor Author

kostasb commented Jan 2, 2025

Update re. the packages included with the base image vs the python one that we do docker image builds on: @oli-g suggested to check whether installing python-3.11 on the wolfi-base image installes those as dependencies. It does install most of them as part of the Dockerfile step that adds python (RUN apk add --no-cache python3=~${python_version} make git), with some exceptions (linux-headers) which are unlikely to be relevant for connectors. That said, there shouldn'be any concerns regarding the base image diff.

@artem-shelkovnikov
Copy link
Member

It could be a "breaking" change for those users who build their images on top of our Dockerfiles.

Just to check - cgr.dev/chainguard/wolfi-base is publicly available, right?

@kostasb
Copy link
Contributor Author

kostasb commented Jan 2, 2025

It could be a "breaking" change for those users who build their images on top of our Dockerfiles.

Just to check - cgr.dev/chainguard/wolfi-base is publicly available, right?

Yes, we've confirmed that this is a publicly available image that can be downloaded by unauthenticated (not logged in to any registry) docker instances.

@artem-shelkovnikov
Copy link
Member

Do we need some sort of release note for the new base image? It could be a "breaking" change for those users who build their images on top of our Dockerfiles.

We can mention it in our release notes indeed. I don't expect a lot of customers to build customer docker images and likely it's just gonna work for them, but it would not hurt to mention that we've updated our docker images recently to make them more secure with potential incompatible changes that the customers will have to fix

@kostasb
Copy link
Contributor Author

kostasb commented Jan 3, 2025

Do we need some sort of release note for the new base image? It could be a "breaking" change for those users who build their images on top of our Dockerfiles.

We can mention it in our release notes indeed. I don't expect a lot of customers to build customer docker images and likely it's just gonna work for them, but it would not hurt to mention that we've updated our docker images recently to make them more secure with potential incompatible changes that the customers will have to fix

Thank you, I added a release note in the issue description.

@kostasb kostasb added v8.18.0 and removed v9.0.0 labels Jan 7, 2025
@kostasb kostasb force-pushed the kostasb/switch-dockerfile-image branch from 0e15a7a to c9a5e46 Compare January 15, 2025 14:46
@kostasb
Copy link
Contributor Author

kostasb commented Jan 15, 2025

The pipeline steps have been tested successfully.

The OSS (extensible) Dockerfile (and Dockerfile.ftest) steps are added as dependencies for the main pipeline, so the build will fail if they don't work, making them part of the release process.

As discussed Trivy scans are included for informative and logging purposes but won't cause the pipeline to fail, since we're getting alerts on equivalent images from Snyk.

I am planning to tag this against 8.18.0 without backporting , because I consider it to be too big of a change for a patch release and we want to include a clear release note with it.

Let me know if you have any objections or comments.

@kostasb kostasb requested review from acrewdson and oli-g January 15, 2025 14:59
acrewdson
acrewdson previously approved these changes Jan 15, 2025
Copy link
Contributor

@acrewdson acrewdson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work 🚀

I'll let E&T folks review and give the go-ahead on merging

thank you for doing this!

Copy link
Contributor

@oli-g oli-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as a user it would be clearer to read oss instead of extensible: what do you all think? I see other Elastic products doing something similar, see here for example.

I also left a couple of comments related to the following: I'm proposing to not build, test and scan the artifacts produced from Dockerfile.ftest, as they're too similar to the artifacts produced from Dockerfile.

Comment on lines 697 to 742
# ----
# Extensible Dockerfile.ftest build, tests and vunlerability scan on amd64
# ----
- label: "Building amd64 Docker image from extensible Dockerfile.ftest"
agents:
provider: aws
instanceType: m6i.xlarge
imagePrefix: ci-amazonlinux-2
env:
ARCHITECTURE: "amd64"
DOCKERFILE_PATH: "Dockerfile.ftest"
DOCKER_IMAGE_NAME: "docker.elastic.co/ci-agent-images/elastic-connectors-extensible-dockerfile-ftest"
DOCKER_ARTIFACT_KEY: "elastic-connectors-extensible-dockerfile-ftest"
command: ".buildkite/publish/build-docker.sh"
key: "build_extensible_dockerfile_ftest_image_amd64"
artifact_paths: ".artifacts/*.tar.gz"
- label: "Testing amd64 image built from extensible Dockerfile.ftest"
agents:
provider: aws
instanceType: m6i.xlarge
imagePrefix: ci-amazonlinux-2
env:
ARCHITECTURE: "amd64"
DOCKERFILE_PATH: "Dockerfile.ftest"
DOCKER_IMAGE_NAME: "docker.elastic.co/ci-agent-images/elastic-connectors-extensible-dockerfile-ftest"
DOCKER_ARTIFACT_KEY: "elastic-connectors-extensible-dockerfile-ftest"
depends_on: "build_extensible_dockerfile_ftest_image_amd64"
key: "test_extensible_dockerfile_ftest_image_amd64"
commands:
- "mkdir -p .artifacts"
- buildkite-agent artifact download '.artifacts/*.tar.gz*' .artifacts/ --step build_extensible_dockerfile_ftest_image_amd64
- ".buildkite/publish/test-docker.sh"
- label: "Trivy Scan amd64 extensible Dockerfile.ftest image"
timeout_in_minutes: 10
depends_on:
- test_extensible_dockerfile_ftest_image_amd64
key: "trivy-scan-amd64-extensible-dockerfile-ftest-image"
agents:
provider: k8s
image: "docker.elastic.co/ci-agent-images/trivy:latest"
command: |-
mkdir -p .artifacts
buildkite-agent artifact download '.artifacts/*.tar.gz*' .artifacts/ --step build_extensible_dockerfile_ftest_image_amd64
trivy --version
env | grep TRIVY
find .artifacts -type f -name '*.tar.gz*' -exec trivy image --quiet --input {} \;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are not going to get a lot of value from these steps, given that Dockerfile.ftest is almost equivalent to Dockerfile.

So I'm wondering: what if instead (maybe in a different PR, as a follow-up) we delete the two .ftest Dockerfiles and we try to live with only two Dockerfiles instead of four (Dockerfile and Dockerfile.wolfi), and we move the RUN .venv/bin/pip install -r requirements/ftest.txt command out of the Dockerfiles, where we actually run those specs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'll remove the ftest builds/tests/scans from this pipeline and follow up with an issue about replacing/removing the ftest dockerfiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@kostasb kostasb Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oli-g I just wanted to bring up an issue I foresee with this suggestion:

In this current PR here we remove make and then run the image as nonroot, which doesn't allow installing packages beyond what's already available.

We won't be able to run the functional test steps on top of this image as these require make. We'd need to build another container image layer with this Dockerfile as base, to re-install make in order to run the functional test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I missed this "little detail"... thank you! I guess we'll have to find a different approach, or leave things as they are for now. We'll discuss it in the new ticket.

.buildkite/pipeline.yml Outdated Show resolved Hide resolved
@kostasb kostasb force-pushed the kostasb/switch-dockerfile-image branch from d39e558 to bfd0ef8 Compare January 16, 2025 11:12
@kostasb
Copy link
Contributor Author

kostasb commented Jan 16, 2025

I think as a user it would be clearer to read oss instead of extensible: what do you all think? I see other Elastic products doing something similar, see here for example.

I also left a couple of comments related to the following: I'm proposing to not build, test and scan the artifacts produced from Dockerfile.ftest, as they're too similar to the artifacts produced from Dockerfile.

I agree and adapted the PR to remove ftest from the pipeline and refer to the Dockerfile as OSS instead of extensible.
I share the same thoughts regarding ftest dockerfiles so we can follow up to replace them under a dedicated issue.

Copy link
Contributor

@oli-g oli-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work!

@kostasb
Copy link
Contributor Author

kostasb commented Jan 17, 2025

Let me know if any objections with merging this PR towards 8.18.0 without backporting to previous minor releases so that it doesn't show up in a patch release. I consider this a potential breaking change for some users, as reflected by the release note in issue description.

If no concerns until then, I plan to merge on Monday, Jan 20.

@kostasb kostasb merged commit f175163 into main Jan 20, 2025
2 checks passed
@kostasb kostasb deleted the kostasb/switch-dockerfile-image branch January 20, 2025 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants